Knowledge of language origin improves pronunciation accuracy of proper names
نویسندگان
چکیده
As it is impossible to have a lexicon with complete coverage, and a high proportion of unknown words are proper names, this paper addresses the issue of automatically finding pronunciations of unseen proper names in US English. Proper names, especially in the US, may come from a large range of ethnic backgrounds. We present a model and results showing that including ethnic origin of words in a statistical model can improve pronunciation results. We used a lexicon of 56,000 proper names from CMUDICT. We also gathered data (text and proper names) from 26 languages to built statistical models that provide an estimate of word origin. Tests against held out data showed a 7.6% absolute improvement from a baseline of 54.8% when language based features were added to our CART-based model. As there are potentially multiple correct pronunciations, we synthesized a random sample of names that did not match the “correct” answer in our test set. Human listeners showed a 17% preference for the model with language features compared to the baseline.
منابع مشابه
Improving Pronunciation Accuracy of Proper Names with Language Origin Classes
Pronunciation of proper names that have different and varied language sources is an extremely hard task, even for humans. This thesis presents an attempt to improve automatic pronunciation of proper names by modeling the way humans do it, and tries to eliminate synthesis errors that humans would never make. It does so by taking into account the different language and language family sources and...
متن کاملBasis Identification for Automatic Creation of Pronunciation Lexicon for Proper Names
Development of a proper names pronunciation lexicon is usually a manual effort which can not be avoided. Grapheme to phoneme (G2P) conversion modules, in literature, are usually rule based and work best for non-proper names in a particular language. Proper names are foreign to a G2P module. We follow an optimization approach to enable automatic construction of proper names pronunciation lexicon...
متن کاملG2P Conversion of Proper Names Using Word Origin Information
Motivated by the fact that the pronunciation of a name may be influenced by its language of origin, we present methods to improve pronunciation prediction of proper names using word origin information. We train grapheme-to-phoneme (G2P) models on language-specific data sets and interpolate the outputs. We perform experiments on US surnames, a data set where word origin variation occurs naturall...
متن کاملGenerating proper name pro for automatic speech
Generating correct pronunciation of proper names remains one of the most difficult tasks in text-to-phoneme transcription. Although phonetic rules can be efficient in processing proper names of one language, foreign family names cannot be always correctly generated without additional pronunciation rules. The present study addresses the problem of pronunciation variants for French and foreign fa...
متن کاملProper Name Machine Translation from Japanese to Japanese Sign Language
This paper describes machine translation of proper names from Japanese to Japanese Sign Language (JSL). “Proper name transliteration” is a kind of machine translation of proper names between spoken languages and involves character-tocharacter conversion based on pronunciation. However, transliteration methods cannot be applied to Japanese-JSL machine translation because proper names in JSL are ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001